Overview
Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 7385 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 1168 |
| Duplicate rows (%) | 15.8% |
| Total size in memory | 2.4 MiB |
| Average record size in memory | 343.2 B |
Variable types
| Categorical | 5 |
|---|---|
| Text | 1 |
| Numeric | 3 |
Variable descriptions
| Make | Make is a categorical variable representing the vehicle manufacturer. It contains 42 distinct categories with no missing values. The distribution is moderately imbalanced: the most frequent brands are Ford, Chevrolet, and BMW, while a long tail of less frequent manufacturers accounts for about 46% of the observations. This suggests high categorical diversity, and grouping rare categories or applying appropriate encoding strategies may be necessary in downstream modeling. |
|---|---|
| Model | The variable Model is a high-cardinality text feature (2053 distinct values, 27.8%) with a highly skewed distribution. It combines heterogeneous information such as drivetrain, body type, and trim level, making it unsuitable for direct categorical encoding. Therefore, semantic feature extraction (e.g., drivetrain type, body style, performance indicators) is preferred over direct usage or one-hot encoding. |
| Vehicle Class | Vehicle Class is a low-cardinality categorical feature with 16 distinct categories and no missing values. The distribution is reasonably balanced across major vehicle segments (SUV, compact, mid-size, full-size). |
| Engine Size(L) | Engine Size (L) is a continuous variable ranging from 0.9 to 8.4 liters. The distribution is heavily concentrated at 2.0 L |
| Cylindres | Cylinders is treated as a categorical variable with eight discrete levels, dominated by 4-, 6-, and 8-cylinder engines. |
| Transmission | Transmission is a categorical variable with 27 distinct levels, combining transmission type and number of gears. Automatic transmissions with 6 to 8 gears dominate the dataset. |
| Fuel Type | Fuel Type is a categorical variable with five levels, dominated by regular (X) and premium (Z) gasoline, while alternative fuels (E85, diesel, natural gas) are relatively rare. |
| Fuel Consumption Comb (L/100 km) | The Fuel Consumption Comb (L/100 km) distribution is positively skewed (Skewness: 0.89), with a mean of 10.98 exceeding the median of 10.6. Most vehicles fall within the 8.9–12.6 range (IQR). The bimodal nature of the plot suggests the dataset contains two distinct vehicle classes with different efficiency profiles. |
| Interactions | CO2 emissions increase with engine size, showing a strong positive association. While the overall trend is approximately linear, the relationship exhibits a change in slope: emissions rise more steeply for smaller engines and tend to increase more gradually for larger engine sizes. The interaction between combined fuel consumption and CO2 emissions reveals a strong positive relationship. The scatter plot shows several nearly parallel linear patterns, indicating that while CO2 emissions increase approximately linearly with fuel consumption, the data consist of distinct subgroups with similar slopes but different intercepts. This suggests that fuel consumption is a primary driver of CO2 emissions, but additional categorical factors—such as fuel type or engine technology—likely influence emission levels. |
| Dataset has 1168 (15.8%) duplicate rows | Duplicates |
Engine Size(L) is highly overall correlated with CO2 Emissions(g/km) and 2 other fields | High correlation |
Fuel Consumption Comb (L/100 km) is highly overall correlated with CO2 Emissions(g/km) and 1 other fields | High correlation |
CO2 Emissions(g/km) is highly overall correlated with Engine Size(L) and 1 other fields | High correlation |
Make is highly overall correlated with Cylinders | High correlation |
Vehicle Class is highly overall correlated with Make and 6 other fields | High correlation |
Cylinders is highly overall correlated with Engine Size(L) and 1 other fields | High correlation |
Transmission is highly overall correlated with Make and 6 other fields | High correlation |
Fuel Type is highly overall correlated with Make and 3 other fields | High correlation |
Reproduction
| Analysis started | 2026-01-09 17:08:50.022665 |
|---|---|
| Analysis finished | 2026-01-09 17:08:59.080171 |
| Duration | 9.06 seconds |
| Software version | ydata-profiling vv4.18.0 |
| Download configuration | config.json |
Variables
Make
Categorical
High correlation
Make is a categorical variable representing the vehicle manufacturer. It contains 42 distinct categories with no missing values. The distribution is moderately imbalanced: the most frequent brands are Ford, Chevrolet, and BMW, while a long tail of less frequent manufacturers accounts for about 46% of the observations. This suggests high categorical diversity, and grouping rare categories or applying appropriate encoding strategies may be necessary in downstream modeling.
| Distinct | 42 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 455.5 KiB |
| FORD | |
|---|---|
| CHEVROLET | |
| BMW | |
| MERCEDES-BENZ | 419 |
| PORSCHE | 376 |
| Other values (37) |
Length
| Max length | 13 |
|---|---|
| Median length | 11 |
| Mean length | 6.1439404 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | ACURA |
|---|---|
| 2nd row | ACURA |
| 3rd row | ACURA |
| 4th row | ACURA |
| 5th row | ACURA |
Common Values
| Value | Count | Frequency (%) |
| FORD | 628 | 8.5% |
| CHEVROLET | 588 | 8.0% |
| BMW | 527 | 7.1% |
| MERCEDES-BENZ | 419 | 5.7% |
| PORSCHE | 376 | 5.1% |
| TOYOTA | 330 | 4.5% |
| GMC | 328 | 4.4% |
| AUDI | 286 | 3.9% |
| NISSAN | 259 | 3.5% |
| JEEP | 251 | 3.4% |
| Other values (32) | 3393 |
Length
| Value | Count | Frequency (%) |
| ford | 628 | 8.3% |
| chevrolet | 588 | 7.8% |
| bmw | 527 | 7.0% |
| mercedes-benz | 419 | 5.6% |
| porsche | 376 | 5.0% |
| toyota | 330 | 4.4% |
| gmc | 328 | 4.3% |
| audi | 286 | 3.8% |
| nissan | 259 | 3.4% |
| jeep | 251 | 3.3% |
| Other values (35) | 3555 |
Most occurring characters
| Value | Count | Frequency (%) |
| E | 4807 | 10.6% |
| O | 3608 | 8.0% |
| A | 3589 | 7.9% |
| R | 3114 | 6.9% |
| I | 2781 | 6.1% |
| D | 2672 | 5.9% |
| N | 2483 | 5.5% |
| C | 2458 | 5.4% |
| S | 2345 | 5.2% |
| M | 2036 | 4.5% |
| Other values (17) | 15480 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 45373 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| E | 4807 | 10.6% |
| O | 3608 | 8.0% |
| A | 3589 | 7.9% |
| R | 3114 | 6.9% |
| I | 2781 | 6.1% |
| D | 2672 | 5.9% |
| N | 2483 | 5.5% |
| C | 2458 | 5.4% |
| S | 2345 | 5.2% |
| M | 2036 | 4.5% |
| Other values (17) | 15480 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 45373 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| E | 4807 | 10.6% |
| O | 3608 | 8.0% |
| A | 3589 | 7.9% |
| R | 3114 | 6.9% |
| I | 2781 | 6.1% |
| D | 2672 | 5.9% |
| N | 2483 | 5.5% |
| C | 2458 | 5.4% |
| S | 2345 | 5.2% |
| M | 2036 | 4.5% |
| Other values (17) | 15480 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 45373 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| E | 4807 | 10.6% |
| O | 3608 | 8.0% |
| A | 3589 | 7.9% |
| R | 3114 | 6.9% |
| I | 2781 | 6.1% |
| D | 2672 | 5.9% |
| N | 2483 | 5.5% |
| C | 2458 | 5.4% |
| S | 2345 | 5.2% |
| M | 2036 | 4.5% |
| Other values (17) | 15480 |
Model
Text
The variable Model is a high-cardinality text feature (2053 distinct values, 27.8%) with a highly skewed distribution. It combines heterogeneous information such as drivetrain, body type, and trim level, making it unsuitable for direct categorical encoding. Therefore, semantic feature extraction (e.g., drivetrain type, body style, performance indicators) is preferred over direct usage or one-hot encoding.
| Distinct | 2053 |
|---|---|
| Distinct (%) | 27.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 496.5 KiB |
Length
| Max length | 41 |
|---|---|
| Median length | 32 |
| Mean length | 11.831957 |
| Min length | 2 |
Unique
| Unique | 502 ? |
|---|---|
| Unique (%) | 6.8% |
Sample
| 1st row | ILX |
|---|---|
| 2nd row | ILX |
| 3rd row | ILX HYBRID |
| 4th row | MDX 4WD |
| 5th row | RDX AWD |
| Value | Count | Frequency (%) |
| awd | 1128 | 6.8% |
| ffv | 592 | 3.6% |
| 4wd | 477 | 2.9% |
| coupe | 375 | 2.3% |
| 4x4 | 333 | 2.0% |
| s | 326 | 2.0% |
| 4matic | 239 | 1.4% |
| cabriolet | 221 | 1.3% |
| xdrive | 215 | 1.3% |
| cooper | 204 | 1.2% |
| Other values (709) | 12464 |
Most occurring characters
| Value | Count | Frequency (%) |
| 9200 | 10.5% | |
| A | 5815 | 6.7% |
| R | 4636 | 5.3% |
| E | 4496 | 5.1% |
| C | 3621 | 4.1% |
| T | 3510 | 4.0% |
| O | 3450 | 3.9% |
| D | 3182 | 3.6% |
| S | 3165 | 3.6% |
| I | 2352 | 2.7% |
| Other values (59) | 43952 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 87379 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 9200 | 10.5% | |
| A | 5815 | 6.7% |
| R | 4636 | 5.3% |
| E | 4496 | 5.1% |
| C | 3621 | 4.1% |
| T | 3510 | 4.0% |
| O | 3450 | 3.9% |
| D | 3182 | 3.6% |
| S | 3165 | 3.6% |
| I | 2352 | 2.7% |
| Other values (59) | 43952 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 87379 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 9200 | 10.5% | |
| A | 5815 | 6.7% |
| R | 4636 | 5.3% |
| E | 4496 | 5.1% |
| C | 3621 | 4.1% |
| T | 3510 | 4.0% |
| O | 3450 | 3.9% |
| D | 3182 | 3.6% |
| S | 3165 | 3.6% |
| I | 2352 | 2.7% |
| Other values (59) | 43952 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 87379 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 9200 | 10.5% | |
| A | 5815 | 6.7% |
| R | 4636 | 5.3% |
| E | 4496 | 5.1% |
| C | 3621 | 4.1% |
| T | 3510 | 4.0% |
| O | 3450 | 3.9% |
| D | 3182 | 3.6% |
| S | 3165 | 3.6% |
| I | 2352 | 2.7% |
| Other values (59) | 43952 |
Vehicle Class
Categorical
High correlation
Vehicle Class is a low-cardinality categorical feature with 16 distinct categories and no missing values. The distribution is reasonably balanced across major vehicle segments (SUV, compact, mid-size, full-size).
| Distinct | 16 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 494.8 KiB |
| SUV - SMALL | |
|---|---|
| MID-SIZE | |
| COMPACT | |
| SUV - STANDARD | |
| FULL-SIZE | |
| Other values (11) |
Length
| Max length | 24 |
|---|---|
| Median length | 21 |
| Mean length | 11.587407 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | COMPACT |
|---|---|
| 2nd row | COMPACT |
| 3rd row | COMPACT |
| 4th row | SUV - SMALL |
| 5th row | SUV - SMALL |
Common Values
| Value | Count | Frequency (%) |
| SUV - SMALL | 1217 | |
| MID-SIZE | 1133 | |
| COMPACT | 1022 | |
| SUV - STANDARD | 735 | |
| FULL-SIZE | 639 | |
| SUBCOMPACT | 606 | |
| PICKUP TRUCK - STANDARD | 538 | |
| TWO-SEATER | 460 | 6.2% |
| MINICOMPACT | 326 | 4.4% |
| STATION WAGON - SMALL | 252 | 3.4% |
| Other values (6) | 457 | 6.2% |
Length
| Value | Count | Frequency (%) |
| 3042 | ||
| suv | 1952 | |
| small | 1628 | |
| standard | 1273 | |
| mid-size | 1186 | 8.1% |
| compact | 1022 | 7.0% |
| pickup | 697 | 4.8% |
| truck | 697 | 4.8% |
| full-size | 639 | 4.4% |
| subcompact | 606 | 4.1% |
| Other values (11) | 1883 |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 8335 | 9.7% |
| A | 7531 | 8.8% |
| 7240 | 8.5% | |
| C | 5478 | 6.4% |
| T | 5454 | 6.4% |
| - | 5327 | 6.2% |
| M | 5174 | 6.0% |
| I | 4979 | 5.8% |
| L | 4688 | 5.5% |
| U | 4668 | 5.5% |
| Other values (14) | 26699 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 85573 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 8335 | 9.7% |
| A | 7531 | 8.8% |
| 7240 | 8.5% | |
| C | 5478 | 6.4% |
| T | 5454 | 6.4% |
| - | 5327 | 6.2% |
| M | 5174 | 6.0% |
| I | 4979 | 5.8% |
| L | 4688 | 5.5% |
| U | 4668 | 5.5% |
| Other values (14) | 26699 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 85573 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 8335 | 9.7% |
| A | 7531 | 8.8% |
| 7240 | 8.5% | |
| C | 5478 | 6.4% |
| T | 5454 | 6.4% |
| - | 5327 | 6.2% |
| M | 5174 | 6.0% |
| I | 4979 | 5.8% |
| L | 4688 | 5.5% |
| U | 4668 | 5.5% |
| Other values (14) | 26699 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 85573 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 8335 | 9.7% |
| A | 7531 | 8.8% |
| 7240 | 8.5% | |
| C | 5478 | 6.4% |
| T | 5454 | 6.4% |
| - | 5327 | 6.2% |
| M | 5174 | 6.0% |
| I | 4979 | 5.8% |
| L | 4688 | 5.5% |
| U | 4668 | 5.5% |
| Other values (14) | 26699 |
Engine Size(L)
Real number (ℝ)
High correlation
Engine Size (L) is a continuous variable ranging from 0.9 to 8.4 liters. The distribution is heavily concentrated at 2.0 L
| Distinct | 51 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.1600677 |
| Minimum | 0.9 |
|---|---|
| Maximum | 8.4 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 57.8 KiB |
Quantile statistics
| Minimum | 0.9 |
|---|---|
| 5-th percentile | 1.5 |
| Q1 | 2 |
| median | 3 |
| Q3 | 3.7 |
| 95-th percentile | 6 |
| Maximum | 8.4 |
| Range | 7.5 |
| Interquartile range (IQR) | 1.7 |
Descriptive statistics
| Standard deviation | 1.3541705 |
|---|---|
| Coefficient of variation (CV) | 0.42852577 |
| Kurtosis | -0.13196328 |
| Mean | 3.1600677 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.80918099 |
| Sum | 23337.1 |
| Variance | 1.8337776 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 1460 | |
| 3 | 804 | 10.9% |
| 3.6 | 536 | 7.3% |
| 3.5 | 529 | 7.2% |
| 2.5 | 423 | 5.7% |
| 2.4 | 346 | 4.7% |
| 1.6 | 302 | 4.1% |
| 5.3 | 290 | 3.9% |
| 1.8 | 216 | 2.9% |
| 1.4 | 211 | 2.9% |
| Other values (41) | 2268 |
| Value | Count | Frequency (%) |
| 0.9 | 3 | < 0.1% |
| 1 | 18 | 0.2% |
| 1.2 | 25 | 0.3% |
| 1.3 | 11 | 0.1% |
| 1.4 | 211 | 2.9% |
| 1.5 | 207 | 2.8% |
| 1.6 | 302 | 4.1% |
| 1.8 | 216 | 2.9% |
| 2 | 1460 | |
| 2.1 | 5 | 0.1% |
| Value | Count | Frequency (%) |
| 8.4 | 5 | 0.1% |
| 8 | 3 | < 0.1% |
| 6.8 | 8 | 0.1% |
| 6.7 | 25 | 0.3% |
| 6.6 | 29 | 0.4% |
| 6.5 | 18 | 0.2% |
| 6.4 | 46 | 0.6% |
| 6.3 | 3 | < 0.1% |
| 6.2 | 162 | |
| 6 | 94 |
Cylinders
Categorical
High correlation
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.7 KiB |
| 4 | |
|---|---|
| 6 | |
| 8 | |
| 12 | 151 |
| 3 | 95 |
| Other values (3) | 71 |
Length
| Max length | 2 |
|---|---|
| Median length | 1 |
| Mean length | 1.0265403 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 4 |
|---|---|
| 2nd row | 4 |
| 3rd row | 4 |
| 4th row | 6 |
| 5th row | 6 |
Common Values
| Value | Count | Frequency (%) |
| 4 | 3220 | |
| 6 | 2446 | |
| 8 | 1402 | |
| 12 | 151 | 2.0% |
| 3 | 95 | 1.3% |
| 10 | 42 | 0.6% |
| 5 | 26 | 0.4% |
| 16 | 3 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 4 | 3220 | |
| 6 | 2446 | |
| 8 | 1402 | |
| 12 | 151 | 2.0% |
| 3 | 95 | 1.3% |
| 10 | 42 | 0.6% |
| 5 | 26 | 0.4% |
| 16 | 3 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 4 | 3220 | |
| 6 | 2449 | |
| 8 | 1402 | |
| 1 | 196 | 2.6% |
| 2 | 151 | 2.0% |
| 3 | 95 | 1.3% |
| 0 | 42 | 0.6% |
| 5 | 26 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 7581 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 4 | 3220 | |
| 6 | 2449 | |
| 8 | 1402 | |
| 1 | 196 | 2.6% |
| 2 | 151 | 2.0% |
| 3 | 95 | 1.3% |
| 0 | 42 | 0.6% |
| 5 | 26 | 0.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 7581 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 4 | 3220 | |
| 6 | 2449 | |
| 8 | 1402 | |
| 1 | 196 | 2.6% |
| 2 | 151 | 2.0% |
| 3 | 95 | 1.3% |
| 0 | 42 | 0.6% |
| 5 | 26 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 7581 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 4 | 3220 | |
| 6 | 2449 | |
| 8 | 1402 | |
| 1 | 196 | 2.6% |
| 2 | 151 | 2.0% |
| 3 | 95 | 1.3% |
| 0 | 42 | 0.6% |
| 5 | 26 | 0.3% |
Transmission
Categorical
High correlation
Transmission is a categorical variable with 27 distinct levels, combining transmission type and number of gears. Automatic transmissions with 6 to 8 gears dominate the dataset.
| Distinct | 27 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 429.8 KiB |
| AS6 | |
|---|---|
| AS8 | |
| M6 | |
| A6 | |
| A8 | |
| Other values (22) |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 2.5773866 |
| Min length | 2 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | AS5 |
|---|---|
| 2nd row | M6 |
| 3rd row | AV7 |
| 4th row | AS6 |
| 5th row | AS6 |
Common Values
| Value | Count | Frequency (%) |
| AS6 | 1324 | |
| AS8 | 1211 | |
| M6 | 901 | |
| A6 | 789 | |
| A8 | 490 | 6.6% |
| AM7 | 445 | 6.0% |
| A9 | 339 | 4.6% |
| AS7 | 319 | 4.3% |
| AV | 295 | 4.0% |
| M5 | 193 | 2.6% |
| Other values (17) | 1079 |
Length
| Value | Count | Frequency (%) |
| as6 | 1324 | |
| as8 | 1211 | |
| m6 | 901 | |
| a6 | 789 | |
| a8 | 490 | 6.6% |
| am7 | 445 | 6.0% |
| a9 | 339 | 4.6% |
| as7 | 319 | 4.3% |
| av | 295 | 4.0% |
| m5 | 193 | 2.6% |
| Other values (17) | 1079 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 6200 | |
| 6 | 3259 | |
| S | 3127 | |
| M | 1831 | 9.6% |
| 8 | 1802 | 9.5% |
| 7 | 1026 | 5.4% |
| V | 576 | 3.0% |
| 9 | 419 | 2.2% |
| 5 | 307 | 1.6% |
| 1 | 210 | 1.1% |
| Other values (2) | 277 | 1.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 19034 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| A | 6200 | |
| 6 | 3259 | |
| S | 3127 | |
| M | 1831 | 9.6% |
| 8 | 1802 | 9.5% |
| 7 | 1026 | 5.4% |
| V | 576 | 3.0% |
| 9 | 419 | 2.2% |
| 5 | 307 | 1.6% |
| 1 | 210 | 1.1% |
| Other values (2) | 277 | 1.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 19034 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| A | 6200 | |
| 6 | 3259 | |
| S | 3127 | |
| M | 1831 | 9.6% |
| 8 | 1802 | 9.5% |
| 7 | 1026 | 5.4% |
| V | 576 | 3.0% |
| 9 | 419 | 2.2% |
| 5 | 307 | 1.6% |
| 1 | 210 | 1.1% |
| Other values (2) | 277 | 1.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 19034 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| A | 6200 | |
| 6 | 3259 | |
| S | 3127 | |
| M | 1831 | 9.6% |
| 8 | 1802 | 9.5% |
| 7 | 1026 | 5.4% |
| V | 576 | 3.0% |
| 9 | 419 | 2.2% |
| 5 | 307 | 1.6% |
| 1 | 210 | 1.1% |
| Other values (2) | 277 | 1.5% |
Fuel Type
Categorical
High correlation
Fuel Type is a categorical variable with five levels, dominated by regular (X) and premium (Z) gasoline, while alternative fuels (E85, diesel, natural gas) are relatively rare.
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 418.4 KiB |
| X | |
|---|---|
| Z | |
| E | |
| D | 175 |
| N | 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Z |
|---|---|
| 2nd row | Z |
| 3rd row | Z |
| 4th row | Z |
| 5th row | Z |
Common Values
| Value | Count | Frequency (%) |
| X | 3637 | |
| Z | 3202 | |
| E | 370 | 5.0% |
| D | 175 | 2.4% |
| N | 1 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| x | 3637 | |
| z | 3202 | |
| e | 370 | 5.0% |
| d | 175 | 2.4% |
| n | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| X | 3637 | |
| Z | 3202 | |
| E | 370 | 5.0% |
| D | 175 | 2.4% |
| N | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 7385 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| X | 3637 | |
| Z | 3202 | |
| E | 370 | 5.0% |
| D | 175 | 2.4% |
| N | 1 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 7385 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| X | 3637 | |
| Z | 3202 | |
| E | 370 | 5.0% |
| D | 175 | 2.4% |
| N | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 7385 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| X | 3637 | |
| Z | 3202 | |
| E | 370 | 5.0% |
| D | 175 | 2.4% |
| N | 1 | < 0.1% |
Fuel Consumption Comb (L/100 km)
Real number (ℝ)
High correlation
The Fuel Consumption Comb (L/100 km) distribution is positively skewed (Skewness: 0.89), with a mean of 10.98 exceeding the median of 10.6. Most vehicles fall within the 8.9–12.6 range (IQR). The bimodal nature of the plot suggests the dataset contains two distinct vehicle classes with different efficiency profiles.
| Distinct | 181 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.975071 |
| Minimum | 4.1 |
|---|---|
| Maximum | 26.1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 57.8 KiB |
Quantile statistics
| Minimum | 4.1 |
|---|---|
| 5-th percentile | 7.2 |
| Q1 | 8.9 |
| median | 10.6 |
| Q3 | 12.6 |
| 95-th percentile | 16.5 |
| Maximum | 26.1 |
| Range | 22 |
| Interquartile range (IQR) | 3.7 |
Descriptive statistics
| Standard deviation | 2.8925063 |
|---|---|
| Coefficient of variation (CV) | 0.2635524 |
| Kurtosis | 1.3935754 |
| Mean | 10.975071 |
| Median Absolute Deviation (MAD) | 1.8 |
| Skewness | 0.89331572 |
| Sum | 81050.9 |
| Variance | 8.3665927 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9.4 | 145 | 2.0% |
| 8.4 | 136 | 1.8% |
| 9.8 | 135 | 1.8% |
| 9.1 | 132 | 1.8% |
| 10.3 | 130 | 1.8% |
| 8.7 | 128 | 1.7% |
| 11 | 127 | 1.7% |
| 9.9 | 125 | 1.7% |
| 10.7 | 124 | 1.7% |
| 9 | 121 | 1.6% |
| Other values (171) | 6082 |
| Value | Count | Frequency (%) |
| 4.1 | 4 | 0.1% |
| 4.2 | 1 | < 0.1% |
| 4.3 | 2 | < 0.1% |
| 4.4 | 2 | < 0.1% |
| 4.5 | 5 | |
| 4.7 | 9 | |
| 4.8 | 7 | |
| 4.9 | 6 | |
| 5 | 5 | |
| 5.1 | 12 |
| Value | Count | Frequency (%) |
| 26.1 | 2 | |
| 25.9 | 2 | |
| 25.8 | 2 | |
| 25.7 | 2 | |
| 23.9 | 1 | < 0.1% |
| 23 | 1 | < 0.1% |
| 22.6 | 4 | |
| 22.5 | 1 | < 0.1% |
| 22.2 | 3 | |
| 22.1 | 2 |
CO2 Emissions(g/km)
Real number (ℝ)
High correlation
| Distinct | 331 |
|---|---|
| Distinct (%) | 4.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 250.5847 |
| Minimum | 96 |
|---|---|
| Maximum | 522 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 57.8 KiB |
Quantile statistics
| Minimum | 96 |
|---|---|
| 5-th percentile | 169 |
| Q1 | 208 |
| median | 246 |
| Q3 | 288 |
| 95-th percentile | 354 |
| Maximum | 522 |
| Range | 426 |
| Interquartile range (IQR) | 80 |
Descriptive statistics
| Standard deviation | 58.512679 |
|---|---|
| Coefficient of variation (CV) | 0.2335046 |
| Kurtosis | 0.47880085 |
| Mean | 250.5847 |
| Median Absolute Deviation (MAD) | 40 |
| Skewness | 0.52609381 |
| Sum | 1850568 |
| Variance | 3423.7336 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 242 | 85 | 1.2% |
| 221 | 82 | 1.1% |
| 214 | 77 | 1.0% |
| 230 | 77 | 1.0% |
| 294 | 76 | 1.0% |
| 232 | 76 | 1.0% |
| 258 | 75 | 1.0% |
| 253 | 75 | 1.0% |
| 246 | 75 | 1.0% |
| 209 | 74 | 1.0% |
| Other values (321) | 6613 |
| Value | Count | Frequency (%) |
| 96 | 4 | |
| 99 | 1 | < 0.1% |
| 102 | 1 | < 0.1% |
| 103 | 1 | < 0.1% |
| 104 | 2 | < 0.1% |
| 105 | 3 | |
| 106 | 2 | < 0.1% |
| 108 | 2 | < 0.1% |
| 109 | 2 | < 0.1% |
| 110 | 7 |
| Value | Count | Frequency (%) |
| 522 | 3 | |
| 493 | 2 | |
| 488 | 1 | < 0.1% |
| 487 | 1 | < 0.1% |
| 485 | 1 | < 0.1% |
| 476 | 1 | < 0.1% |
| 473 | 1 | < 0.1% |
| 467 | 1 | < 0.1% |
| 465 | 3 | |
| 464 | 2 |
Interactions
Correlations
| Engine Size(L) | Fuel Consumption Comb (L/100 km) | CO2 Emissions(g/km) | |
|---|---|---|---|
| Engine Size(L) | 1.000 | 0.817 | 0.851 |
| Fuel Consumption Comb (L/100 km) | 0.817 | 1.000 | 0.918 |
| CO2 Emissions(g/km) | 0.851 | 0.918 | 1.000 |
| Engine Size(L) | Fuel Consumption Comb (L/100 km) | CO2 Emissions(g/km) | |
|---|---|---|---|
| Engine Size(L) | 1.000 | 0.862 | 0.869 |
| Fuel Consumption Comb (L/100 km) | 0.862 | 1.000 | 0.963 |
| CO2 Emissions(g/km) | 0.869 | 0.963 | 1.000 |
| Engine Size(L) | Fuel Consumption Comb (L/100 km) | CO2 Emissions(g/km) | |
|---|---|---|---|
| Engine Size(L) | 1.000 | 0.690 | 0.698 |
| Fuel Consumption Comb (L/100 km) | 0.690 | 1.000 | 0.911 |
| CO2 Emissions(g/km) | 0.698 | 0.911 | 1.000 |
| Make | Vehicle Class | Engine Size(L) | Cylinders | Transmission | Fuel Type | Fuel Consumption Comb (L/100 km) | CO2 Emissions(g/km) | |
|---|---|---|---|---|---|---|---|---|
| Make | 1.000 | 0.804 | 0.839 | 0.892 | 0.901 | 0.735 | 0.706 | 0.762 |
| Vehicle Class | 0.804 | 1.000 | 0.545 | 0.623 | 0.734 | 0.519 | 0.617 | 0.601 |
| Engine Size(L) | 0.839 | 0.545 | 1.000 | 0.812 | 0.655 | 0.373 | 0.681 | 0.709 |
| Cylinders | 0.892 | 0.623 | 0.812 | 1.000 | 0.529 | 0.293 | 0.692 | 0.743 |
| Transmission | 0.901 | 0.734 | 0.655 | 0.529 | 1.000 | 0.617 | 0.608 | 0.597 |
| Fuel Type | 0.735 | 0.519 | 0.373 | 0.293 | 0.617 | 1.000 | 0.659 | 0.373 |
| Fuel Consumption Comb (L/100 km) | 0.706 | 0.617 | 0.681 | 0.692 | 0.608 | 0.659 | 1.000 | 0.953 |
| CO2 Emissions(g/km) | 0.762 | 0.601 | 0.709 | 0.743 | 0.597 | 0.373 | 0.953 | 1.000 |
| CO2 Emissions(g/km) | Cylinders | Engine Size(L) | Fuel Consumption Comb (L/100 km) | Fuel Type | Make | Transmission | Vehicle Class | |
|---|---|---|---|---|---|---|---|---|
| CO2 Emissions(g/km) | 1.000 | 0.477 | 0.869 | 0.963 | 0.164 | 0.381 | 0.262 | 0.285 |
| Cylinders | 0.477 | 1.000 | 0.587 | 0.423 | 0.184 | 0.606 | 0.242 | 0.267 |
| Engine Size(L) | 0.869 | 0.587 | 1.000 | 0.862 | 0.242 | 0.499 | 0.280 | 0.262 |
| Fuel Consumption Comb (L/100 km) | 0.963 | 0.423 | 0.862 | 1.000 | 0.335 | 0.337 | 0.270 | 0.297 |
| Fuel Type | 0.164 | 0.184 | 0.242 | 0.335 | 1.000 | 0.447 | 0.353 | 0.296 |
| Make | 0.381 | 0.606 | 0.499 | 0.337 | 0.447 | 1.000 | 0.418 | 0.359 |
| Transmission | 0.262 | 0.242 | 0.280 | 0.270 | 0.353 | 0.418 | 1.000 | 0.309 |
| Vehicle Class | 0.285 | 0.267 | 0.262 | 0.297 | 0.296 | 0.359 | 0.309 | 1.000 |
Missing values
Sample
| Make | Model | Vehicle Class | Engine Size(L) | Cylinders | Transmission | Fuel Type | Fuel Consumption Comb (L/100 km) | CO2 Emissions(g/km) | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | ACURA | ILX | COMPACT | 2.0 | 4 | AS5 | Z | 8.5 | 196 |
| 1 | ACURA | ILX | COMPACT | 2.4 | 4 | M6 | Z | 9.6 | 221 |
| 2 | ACURA | ILX HYBRID | COMPACT | 1.5 | 4 | AV7 | Z | 5.9 | 136 |
| 3 | ACURA | MDX 4WD | SUV - SMALL | 3.5 | 6 | AS6 | Z | 11.1 | 255 |
| 4 | ACURA | RDX AWD | SUV - SMALL | 3.5 | 6 | AS6 | Z | 10.6 | 244 |
| 5 | ACURA | RLX | MID-SIZE | 3.5 | 6 | AS6 | Z | 10.0 | 230 |
| 6 | ACURA | TL | MID-SIZE | 3.5 | 6 | AS6 | Z | 10.1 | 232 |
| 7 | ACURA | TL AWD | MID-SIZE | 3.7 | 6 | AS6 | Z | 11.1 | 255 |
| 8 | ACURA | TL AWD | MID-SIZE | 3.7 | 6 | M6 | Z | 11.6 | 267 |
| 9 | ACURA | TSX | COMPACT | 2.4 | 4 | AS5 | Z | 9.2 | 212 |
| Make | Model | Vehicle Class | Engine Size(L) | Cylinders | Transmission | Fuel Type | Fuel Consumption Comb (L/100 km) | CO2 Emissions(g/km) | |
|---|---|---|---|---|---|---|---|---|---|
| 7375 | VOLVO | S90 T6 AWD | MID-SIZE | 2.0 | 4 | AS8 | Z | 9.6 | 223 |
| 7376 | VOLVO | V60 T5 | STATION WAGON - SMALL | 2.0 | 4 | AS8 | Z | 8.9 | 208 |
| 7377 | VOLVO | V60 T6 AWD | STATION WAGON - SMALL | 2.0 | 4 | AS8 | Z | 9.4 | 219 |
| 7378 | VOLVO | V60 CC T5 AWD | STATION WAGON - SMALL | 2.0 | 4 | AS8 | Z | 9.4 | 220 |
| 7379 | VOLVO | XC40 T4 AWD | SUV - SMALL | 2.0 | 4 | AS8 | X | 9.0 | 210 |
| 7380 | VOLVO | XC40 T5 AWD | SUV - SMALL | 2.0 | 4 | AS8 | Z | 9.4 | 219 |
| 7381 | VOLVO | XC60 T5 AWD | SUV - SMALL | 2.0 | 4 | AS8 | Z | 9.9 | 232 |
| 7382 | VOLVO | XC60 T6 AWD | SUV - SMALL | 2.0 | 4 | AS8 | Z | 10.3 | 240 |
| 7383 | VOLVO | XC90 T5 AWD | SUV - STANDARD | 2.0 | 4 | AS8 | Z | 9.9 | 232 |
| 7384 | VOLVO | XC90 T6 AWD | SUV - STANDARD | 2.0 | 4 | AS8 | Z | 10.7 | 248 |
Duplicate rows
Most frequently occurring
| Make | Model | Vehicle Class | Engine Size(L) | Cylinders | Transmission | Fuel Type | Fuel Consumption Comb (L/100 km) | CO2 Emissions(g/km) | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|
| 686 | LEXUS | GS F | COMPACT | 5.0 | 8 | AS8 | Z | 12.5 | 293 | 5 |
| 227 | CHRYSLER | 300 | FULL-SIZE | 3.6 | 6 | A8 | X | 10.3 | 242 | 4 |
| 231 | CHRYSLER | 300 AWD | FULL-SIZE | 3.6 | 6 | A8 | X | 11.0 | 258 | 4 |
| 317 | FIAT | 500L | STATION WAGON - SMALL | 1.4 | 4 | A6 | X | 9.4 | 221 | 4 |
| 531 | INFINITI | QX60 AWD | SUV - SMALL | 3.5 | 6 | AV7 | Z | 10.9 | 257 | 4 |
| 707 | LEXUS | NX 300h AWD | SUV - SMALL | 2.5 | 4 | AV6 | X | 7.5 | 176 | 4 |
| 710 | LEXUS | RC F | SUBCOMPACT | 5.0 | 8 | AS8 | Z | 12.6 | 289 | 4 |
| 712 | LEXUS | RX 350 AWD | SUV - SMALL | 3.5 | 6 | AS8 | X | 10.8 | 252 | 4 |
| 716 | LEXUS | RX 450h AWD | SUV - STANDARD | 3.5 | 6 | AV6 | Z | 7.9 | 185 | 4 |
| 894 | MITSUBISHI | RVR 4WD | SUV - SMALL | 2.0 | 4 | AV6 | X | 9.2 | 213 | 4 |